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ABSTRACT 



This Monte Carlo study examines whether, given 
various numbers of variables, treatments, and sample sizes, in a 
one-way multivariate analysis of variance, Type I error rates of the 
test approximations provided by the BMDP program, the Statistical 
Analysis System (SAS), and the Statistical Package for the Social 
Sciences (SPSS) for Roy's largest root, Hotelling's trace, Wilks' 
likelihood ratio, and P. C. S. Pillai's trace meet the stringent 
criterion of J. V. Bradley (1978) for robustness. For each of 85 
conditions, the Type I error rate for each of the above statistical 
tests was estimated based ou 100,000 random samples per situation. 
Results indicate that in mult ivariate analysis of variance studies 
with relatively small numbers of subjects of around 15 per treatment 
level, or fewer, the current probability levels reported by SAS and 
SPSS are conservative for the F approximations based on Pillai's 
trace and liberal for the F approximat ions based on Hotelling's 
trace. The BMDP 4V program does not report Pillai's trace and reports 
accurate probability values for Hotelling's trace. All of these 
programs report accurate values for the F approximations based on 
Wilks 1 likelihood ratio and for Roy's largest root (SPSS does not 
report Roy's largest root)* Recommendations are made for specific 
conditions. Two tables and five figures present analysis results. 
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Multivariate Test Statistics and 
Their Approximations: Some Problems 

Objectives 

The purpose of this of this paper is provide educational researchers with information 
concerning the Type I error rates of the test statistic (For x2) approximations of the four 
multivariate statistical tests reported by most statistical packages. The question 
answered in this paper is: 

Given various numbers of variables, numbers of treatments, and sample sizes in a one- 
way multivariate analysis of variance, do the Type I error rates of the test statistic 
approximations provided by BMDP, SAS, and SPSS for Roy's largest root, Hotelling's 
trace, Wlks's likelihood ratio, and Pillai's trace meet Bradley's (1978) stringent 
criterion for robustness? 

Perspectives 

Recently we happened upon a data set where using SPSS's program MANOVA (or 
SAS's program GLM) we found a significant omnibus multivariate F (p < .044) based on 
Hotelling's trace (a = .05), but a nonsignificant (p < .0654) result for this same statistic 
when using BMDP's 4V program. Our results are illustrated in Table 1 for SPSS 
(Hotellings line) and in Table 2 for BMDP (CHISQ line). The difference between the two 
results was caused by the different approximation methods used by the programs to 
arrive at their probability vsdues. This led us to this study and to the use of Bradley's 
(1978, p. 146) criterion for examining the Type I error rates of the approximation 
methods under a variety of conditions. 
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Table 1 



One-way MANOVA (Six Treatments, Six Dependent variables, 
Five Subjects Per Treatment) Output From SPSS (manova) 



Multivariate 


Testa of 


Sigrnif lcance 


(S = 5, M = 


0, n - 8 1/2) 


Test Name 


value 


Approx. F 


Hypoth. DF 


Error DF sicr of F 


Pillais 


1.11157 


1. 09582 


3 0.00 • 


115.00 .354 


Hotellings 


2 .78697 


1.61645 


30.00 


87. 0\ .044 


ffilks 


.19299 


1.32274 


30.00 


78.00\ .164 


Roys 


.69570 









Note . These results agree with those found using SAS (QLH>^ 

\ 

Table 2 



One-v/ay MANOVA (Six Treatments, Six Dependent variables. 
Five Subjects Per Treatment) Output From BMDP4V 



STATISTI 


C 


F 


DF 






p 


L RATIO = 0 . 


192990 


1 .32 


30.00 


78.00 


0 . 


1638 


TRACK = 2 . 


.78697 












TZSQ = 64 


.1004 












CHISQ = 


19.43 




30. 


226 


0 . 


.0654 


MXROOT = 0 . 


695699 








0 


.0197 



Bradley's stringent criterion for robustness is that the actual level of significance be 
within ± .la of the nominal level of significance (a). For example, if a = .05, then the 
approximation method should yield a nominal value that falls between .046 and .055. We 
felt that this level of precision would be necessary for an approximation method found in 
a statistical package which is meant to be used in a wide variety of research situations. 

The F approximation for Roy's largest root is based on Harris (1975) in BMDP and Pillai 
(1965) for SPSS and SAS; for Hotelling's trace, BMDFs chi-square approximation is 
based on Tiku (1971) and SAS and SPSS's F approximation is based on Pillai (1960); 
Wilks's likelihood ratio F approximation for all programs is based on Rao (1973); Pillai's 
trace /^approximation is based on Pillai (1960) for SAS and SPSS. 
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Olson (1976) urged researchers to provide the preceding information in their research 
reports when he indicated: 

In view of the differing robustness performances of the test criteria, researchers 
who use the expression multivariate F should include a footnote specifying whose 
approximation to which criterion was employed, (p. 584) 



A Monte Carlo study was undertaken based on a format developed by Olson (1975). A 
FORTRAN program was written to generate discriminant score data based on no 



violations of the multivariate analysis of variance (MANOVA) test assumptions. The 
progra\m used the EMSL (1987) random number generator, RNMVN, to generate 



random score vectors from a multivariate normal distribution. Given the following 
finite numbers of dependent variables, treatment levels, and samples sizes, and our 
investigation of only the .05 level of significance, the preceding method of generating 
data yields results that are generalizable to any real life situation where the data meet 
the MANOVA assumptions and where there are no differences among the treatment 
population means. 

The study included the following combinations of dependent variables by treatment 
levels 2X3, 3X3, 3X6, 6X3, and 6X6. For each combination of dependent variables by 
treatment levels the equal n sample sizes per treatment level were 4 (1) 20. In each of 
these eighty-five different conditions the Type I error rates for Roy's largest root based on 
Harris (1975) and Pillai (1966); Hotelling's trace based on Tiku (1971) and Pillai (1960); 
Wilks's likelihood ratio based on Rao (1973); and Pillai's trace based on Pillai (1960) were 
estimated based on 100,000 random samples per situation. Following the logic provided 
by Robey and Barcikowski (1992), this number of samples would yield power of more 
than .90 of detecting (a = .05) a departure of .la from a nominal alpha of .05. Our 
FORTRAN program was run on Ohio's Cray Y-MP Supercomputer. 



Methods/Data Source 
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Results 

The Type I error rates based on the F approximations found for Roy's largest root and for 
Wilks's likelihood ratio as well as the x2 approximation for Hotelling's trace met 
Bradley's stringent criterion in all eighty-five cases. However, the F approximations 
found for Hotelling's trace and for Pillai's trace failed to meet Bradley'3 stringent 
criterion under various conditions with call sample sizes between six and fifteen. The 
latter points are illustrated in figures 1 tiirough 6. 

In figures 1 through 5 the actual levels of significance (labeled probability) and the 
treatment sample sizes (labeled group size) are plotted based on the F approximations 
for Roy's root, Pillai's trace, Wilks's lambda, the Hotelling-Lawley trace and Tiku's chi- 
square approximation for the Hotelling-Lawley trace. In these figures Bradley's 
stringent criterion for robustness is identified by lines at probability values of .045 end 
.055, and the nominal level of significance is identified by a line at .05. The F 
approximations for Roy's root, Wilks's lambda, and Tiku's chi-square approximation 
for the Hotelling-Lawley trace all yield estimates of the nominal levels of significance 
that fall between the lines at .045 and .055, i.e., they all yield estimates of the nominal 
level of significance that meet Bradley's criterion. However, all five figures also contain 
treatment sample sizes where estimates of the nominal levels of significance provided by 
the F approximations of Pillai's trace are below .045 and the estimates of the nominal 
levels of significance of the F approximation for the Hotelling-Lawley trace are above 
.055, i.e., fail to meet Bradley's criterion. 

In figure 6 we indicate the conditions and the treatment sample sizes prior to the 
sample size where the estimates of the nominal levels of significance based on the F 
approximations for the Hotelling-Lawley trace and of Pillai's trace meet Bradley's 
criterion. For example, given two dependent variables and three treatment levels, the F 
approximation for the Hotelling-Lawley trace yielded an estimate of the level of 
significance that fails to meet Bradley's criterion when the sample size is 6 or less (see 
cell 1,1 in figure 6), but would yield an estimate of significance that does meet Bradley's 
criterion when !;he sample size is 7 or larger. For the same conditions the F 
approximation for Pillai's trace yielded an estimate of the level of significance that fails 
to meet Bradley's criterion when the sample size is 10 or less (see cell 1,1 in figure 6), but 
would yield an estimate of significance that does meet Bradley's criterion when the 
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sample size is 11 or larger. The latter two results can also be observed by viewing the 
plots for Hotelling's trace and Pillai's trace in figure 1. 



Treatments 

3 6 



6H 




1 OP 




8 


9 


11 


9 


12 


15 


11 


13 



gjgur* 6. Sample size prior to whan the tact statistic 
yielded estimates of the nominal level of significance that 
meet Bradley's stringent criterion. The first sample size 
in each cell is based on Hotelling's Trace (B) and the second 
sample size is based on Pillai's Trace (P) . 



Conclusions 

The results indicate that in multivariate analysis of variance studies with relatively 
small numbers of subjects of around 15 per treatment level or less the current 
probability values reported by SAS(GLM) and SPSS(MANOVA) are conservative for the 
F approximations based on Pillai's trace and liberal for the F approximations based 
Hotelling's trace. The BMDP4V program does not report Pillai's trace and reports 
accurate probability values for Hotelling's trace. All of these programs report accurate 
values for the F approximations based on Wilks's likelihood ratio and for Roy's largest 
root (except SPSS which does not report a probability value for Roy's largest root). 
Unfortunately, many MANOVA studies with small subject-variable ratios exist in the 
social science literature (Olson, 1976). It would be unfortunate if the authors of these 
studies used Pillai'o trace and found no significant result when one existed or used 
Hotelling's trace and found a significant result when none existed. 

Future research. We recommend that future research in this area complete the design 
shown in figure 6 with the addition of 9 and 12 treatment levels. Furthermore, power 
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calculations for a wide variety of designs have been recently presented by Muller, 
LaVange, Ramey, and Ramey (1992) using the F approximations for Hotelling's trace 
and Pillai's trace that we have found yield poor approximations with small sample 
sizes. Future research which exams the performance of these approximations with 
iarge effect sizes and small sample sizes is warranted* 

Recommendation. Given a small subject-variable ratio and no violations of 
assumptions, we recommend that researchers consider the probability values provided 

by the F approximations for Roy's largest root, Wilks's likelihood ratio, or Tiku's x2 
approximation for Hotelling's trace (as found in BMDP4V). Furthermore, we 
recommend that all statistical packages use Tiku's x2 approximation for Hotelling's 
trace. Given violations of the MANOVA tests' assumptions (Olson, 1975, 1976), we 
recommend the use of F approximations based on Wilks's lambda (Stevens, 1979) or 
Pillai's trace (recognizing that this will be conservative). For all conditions, we call for 
an improved estimate of the F approximation for Pillai's trace. 
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